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ABSTRACT 

This paper reports the results of an investigation conducted 

at NASA/JSC in regard to sampling unit size considerations 

that support timely estimates on a global basis of crop acre- 

• 

ages using remotely-sensed (satellite-based) data. Insight 
into the optimal sampling unit size was obtained by statisti- 
cally modeling the variance of the crop acreage as a function 
of the sampling unit size In conjunction with considerations 
for cost and measurement (crop Identification at the sampling 
unit level) difficulties. Results of the investigation are 
reported for sampling units ranging In size from less than two 
acres up to the county-level. 
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1.0 Introduction 

The first systematic attempt to collect agricultural statistics dates 
back more than a century to the Census of 1840 {.Benedict, 1939). From that 
date forward an Increasing volume of agricultural statistics has been col- 
lected periodically in Census enumeration decennially to 1920 and quinquenni- 
ally thereafter. A rudimentary system of annual agricultural estimation 
was also begun about 1840 i.n the Patent Office, Upon Commissioner 
Ellsworth's resignation in 1845, however, interest in agricultural statistics 
subsided in the Patent Office, and it was not until after the Department of 
Agriculture was organized in 1862 that annual intercensus estimates were 
again revived (Ebhling, 1939). Current monthly reports on crop conditions 
also predated the establishment of the Department of Agriculture by a few 
months. Orange Judd, editor of the American Agriculturalist, published sum- 
maries of crop condition reports submitted voluntarily by subscribers to his 
paper for the five months. May through September, 1862 (Ebhling, 1939). 

Judd's efforts were the forerunner to the Department's program of monthly 
reports on crop prospects which have been issued regularly during the growing 
season since the first publication in July 1863. 

1. This research was carried out under the auspices of the JSC/NRC Post-Doc 
Research Associateship. 
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Since 1863, the estimating work of the Department of Agriculture has 

expanded very greatly until today a large volume of agricultural estimates is 

published on a current basis. Tha substantial expansion in the volume of 

agricultural estimates has not been paralleled by major improvements in 

estimating methods. This is somewhat distressing in view of the significant 

» 

developments in the theory of sample design - - particularly in the past 40 
years. Until recent efforts of the USDA Statistical Reporting Service (now 
part of the Economics, Statistics, and Cooperative Services) and the Large 
Area Crop Inventory Experiment 0-ACIE) conducted at NASA/JSC, in Houston, 
Texas (refs. 9, 10, 11, and 14), the predominant method has been one 
involving the use of mailed inquiries for collection of basic data and an 
assortment of techniques utilized to remove bias in the transformation of 
basic data into published estimates. Since 1974, satellite remote sensing 
technology, developed in the previous decade, in conjunction with statistical 
survey methodology were assembled into an experimental crop inventory system 
(LACIE) and tested for wheat in several countries. This experiment was 

i 

concluded with the LACIE Symposium conducted at NASA/JSC 'in October 1978 
(ref. 14). For details of the sampling strategy- util ized in LACIE, refer to 
the Proceedings of the aforementioned LACIE Symposium or to the paper by 
Chhikara and Feiveson in last year's Proceedings of the Annual Meeting of 
the ASA (ref. 3) held in San Diego. 

In seeking to improve the efficiency of crop area estimation, the choice 
of the optimal sampling unit size has been a subject of much discussion at 
NASA/JSC. The purpose of this paper is to report preliminary results of 
the sampling unit size investigation, ongoing at NASA/JSC, that supports 
timely estimates on a global basis of crop acreages utilizing remotely-sensed 
(.satellite-acquired) data. The approach taken is one of modeling the acreage 


3 


variance as a function of sampling unit size based on studies by Smith (1938), 
Mahalanobis (1940), Jessen (1942), Cochran 0942), Hansen and Hurwitz 0942), 
and'Asthana (1950), The* size of the sampling units investigated in these 
earlier studies were limited in size from several square feet up to approxi- 
mately forty acres. This paper reports the results of variance modeling for 
sampling units up to approximately 25,000 acres in size. Finally, this 
modeled relation is utilized in arriving at a closed-form solution to the 
optimal sampling unit size that minimizes cost. 

2.0 The Sampling Unit Utilized in LACIE 

It was decided at the outset of LACIE that sampling of areas was not 
only desirable but essential. It became apparent that the conversion of the 
satellite-acquired spectral measurements to wheat acreage estimates could 
not be accomplished by an automatic computerized procedure but had to be done 
with the participation of human intelligence (photograph interpretation by 
analyst-interpreters). The time-cost element of this participation had to 
be assessed against the efficiency of LACIE sampling techniques. It was found 
that the sampling error (approximately 2 percent ) resulting from quite 
moderate sampling fractions (approximately 3 percent) was comparable if not 
smaller than the percentage error resulting from measurements. Cost-effective- 
ness. and measurement considerations played a major role indicating the sampling 
unit size selected at the outset of LACIE. 

For various reasons, it was impractical to consider using sampling units 
as small as one acre in size. Instead, LACIE decided to use an area unit 
and record the spectral measurements for all resolution elements within the 
area unit as the sample information. The size of the selected sampling area 
was 5 by 6 nautical miles. It may be argued that this unit is too large from 
the standpoint of sampling efficiency (it contains approximately 25,000 acres). 
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The size of this unit may not be optimum; however, the following practical 
considerations dictated the use of a unit of at least a comparable size, 

1. It was necessary to register the acquisition of data from segments 
acquired during the various passages of the satellite oyer the same segment. 
The technology of identifying the same segment in these various passages 
requires key points within the segment that are easily recognizable and, in 
turn, this requires a segment of an adequate size. 

2. Again, the satellite imagery and its interpretation by the analysts, 

as well as the computation of signatures custom-made for the segment, requires 
an adequate size, as does the measurement procedure. * 

3. LACIE addressed the problem of how the variance of the statistical 
sample could be reduced by using areas of smaller size; the gains did not 
justify changing from the above segment size to a much smaller area in view 
of the aforementioned and other practical limitations. 

With future plans for system capabilities that permit a relaxation of many 
of the constraints that existed in LACIE, additional consideration can be 
given to alternative sampling unit sizes which is the subject of the 
remainder of this paper. 

3.0 Model Form Selected for Investigation 

The guiding theory for selecting the proper size of cluster has been 
investigated by. a number of statisticians. Several attempts have been made 
to work out' the relationship between the variance of the mean of a single 
cluster and its size. The first one was due to Fairfield Smith (1938). He 
found the relationship to be satisfactory on yield data for different size 
plots. Jessen (1942) showed that most economic characters relating to farm 
data follow a slightly different law from that of Fairfield Smith. He postu- 
lated that the mean square among elements within a cluster is a monotonic 
increasing function of the size of the cluster. The same relationship 
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developed by Jessen was independently suggested by Mahalanobis (.1940). This 
was also the finding of Asthana (1950) who has fitted Jessen's law to describe 
the mean square within clusters for acreage under wheat for a large number 
of villages. The algebraic solution of the problem of choosing the optimum 
number and size of clusters was given by Cochran (.1942), confirming the 
conclusions based on Jessen's empirical calculations. The fact that Jessen's 
approach was not universally applicable was soon evidenced when Hansen and 
Hurwitz (1942) presented examples which shewed that for certain items in 
urban sampling the variance function was quite different from that used by 
Jessen. In any case, the success of these studies dictated our choice of model 
and the subsequent investigation in this paper. 

The above studies indicated that the use of the power function is a 
strong candidate for providing a simple yet satisfactory mathematical model 
for the functional dependence of the population unit-to-unit variance on the 
sampling unit size. The size of the sampling units in these earlier studies 
were limited to sizes ranging from several square feet to approximatly 40 
acres. This paper investigates the utility of the power function in modeling 
the variance as a function of sampling units ranging all the way up to more 
than 25,000 acres'. 

The remaining sections-of this report cover the approach used to determine 
the model fit, an evaluation of the model using ground truth data collected 
from the 19/7-78 wheat crop year of the Large Area Crop Inventory Experiment 
in the U.S. Great Plains, and, finally, derivation of the optimal sampling 
unit size under certain cost considerations, 

4.0 Approach for Estimation of Model Parameters 

This section gives a brief description of the Analysis of Variance 
Techniques (see Cochran [1977]) used to obtain estimates of the cluster-to-cluster 



wheat area variance for different size clusters and the approach used to 
fit the power function. In the following discussion, let N denote the total 
number of 5 by 6 nautical mile segments constituting the sampling frame (i.e., 
the agricultural area of a stratum) and consider each to be further subdivided 
into M subunits of equal size (discounting left over areas). Finally, letting 
n denote a random sample of n segments from the stratum and A^- denote the 

crop area in segment i (i=l,...,n) for subunit j (j=l,..,»M), then , S w , 

2 2 2 2 
and S provide unbiased estimates for cr^ , cr , and cr, respectively, (see 

Cochran [1977]) where: 

n M 


2 2 2 (A, - A..)' 

V = HiL ll 


n - 1 


14.1) 


n M 


2 2 2 (A- • - A*) 

v - 1=1 J=1 . . 

w ntM-Tl 


2 


(4.2) 


N-l 
= NM-1 



, N(M-1 ) 
NM-1 



(4.3) 


. Historically (refs. 1, 4, 7, 8, 12, and 13), the model 

S 2 (x) = Ax B * (4.4) 

has been found to work quite well in relating the areal subunit size, x, to 

p 

the subunit-to-subunit crop, area variance, S (x) (A and B are estimated 
parameters). Using the 5 by 6 nautical mile data collected from the 1977-78 
wheat crop in the U.S. Great Plains for input to equations (4.1) - (4.3), 

A and B in (4.4) were estimated by the method of least squares. 


5<0 Evaluation of Fitted Model 

Digitized ground truth for a random sample of 124 5 by 6 nautical mile 
segments from nine states (see Table 5.1) was utilized in equations (4.1) - 
(4.3) to estimate A and B in (4.4) for subunits ranging in size from 171 
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I STATE 

NUMBER OP DIGITIZED SEGMENTS 

COLORADO 

9 

KANSAS 

13 

MINNESOTA 

13 

MONTANA 

18 

NEBRASKA 

15 

NORTH DAKOTA 

19 

OKLAHOMA 

13 

SOUTH DAKOTA 

15 

TEXAS 

9 

TOTAL 

124 


Table 5,1; Summary of Data by State 


to 25,4 26 acres. Estimates of the variance using the fitted equation 
were in close agreement with the estimates 'obtained from the analysis of 
variance technique with coefficients of determination being very close to 
one for all states. The relative errors, sum of relative errors, and the 
mean of the absolute relative errors were all negligibly small for each state. 
The subuni-t-to-subunit variance was estimated directly from the data set for 
other subunit .sizes not used in the approximation of A and 8. These estimates 

* V. 

also proved to be in very close agreement with the projected values estimated 
from the fitted models. Table 5.2 summarizes the estimates for A and 8 for 
each of the nine states. Table 5.3 details the results for Texas {similar 
results were obtained for the remaining 8 states investigated}. Assuming equal 
costs (per sampling unit), Table 5.4 summarizes the 9-state allocation (under 
a Neyman allocation) and sampling rate results as a function of the sampling 
cluster size. The allocation formula is discussed in appendix A. 
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Table 5.2 : State-Level Parameter B 

Estimates of A and B in S 2 00=Ax D 



Table 5.3 : Summary of Results for Texas 





CLUSTER 
SIZE 
IN ACRES 


TOTAL 

ALLOCATION 


SAMPLING 

RATE 


CLUSTER 
SIZE 
IN ACRES 

CLUSTER SIZE 
AS PERCENT OF 
5x6 N.MI. 
SEGMENT 

TOT, 

ALL0CA' 

25,463 

100% 

487 

22,918 

9055 

501 

20,371 

o 

CO 

517 

17,825 

70% 

536 

15,278 

60% 

559 

12,732 

50% 

587 

10,185 

m 

624 

7,639 

30% 

674 

5,092 

20% 

753 

2,546 

10% 

908 

1,019 

4% 

1,163 

113 

.0045% 

2,108 

1.13 

.000045% 

7,325 


Table 5.4 : The Estimated Total U.S. Allocation and 

Sampling Rate as a Function of Sampling Cluster Size 

A 

Under stratified random sampling, the acreage estimator, A, has the form 


/v ■- 

A - Z 
j=l 



Z 

1=1 



N ; 


(5.1) 

o 


where 


and 


L = the total number of strata 

n • = the number of sampling units selected from stratum j 

J 

A.. = the crop acreage estimate for the ith sampling unit 
in stratum j 


N. = the total number of sampling units in the sampling frame 
J of stratum j. 


*i 
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Similarly, from (5.1), the variance, ojjj , of A is given by ( 


2 L 2 ^4 Cfl 2 

1 = 2 N. Z 0 - m 1 ) A. 

M j°3 J ' N j n. 


L o a fl 2 
= 2 W. 2 fl J 


J*1 


n. 


05 - 2 ) 


Replacing Nj and ojj L in (5,2) with 

j 

N, a 
j x. 


and 


where 


°a/ = a j x / j 


05 - 3 ) 


(5.4) 


A- = the total area of the sampling frame in the jth stratum 

J 

X.- - the total area of each sampling unit in stratum j 

J 

and a. and b. are parameters estimated using the approach discussed earlier, 
J J 

? 

o^ takes the form 


o L n 2 b.-2 

f = ! _i_ a i x 1 J 

A j=l nj J 1 


(5.5) 


A cost function that appears more realistic in the case of acquiring and 
processing (i.e., estimating, sampl ing unit level crop acreages) satellite-based 
data is the following: 


C’= 2 n, t.C R , + x -C . ) 




j ^ Bo rt j w wj 


C5,6). 


where n- and x- are as described earlier and 
J J 


Cn. = the cost per sampling unit in stratum o regardless of its 
^ size (.i.e., overhead costs, etc.) 

C . = the cost per elemental unit (one acre in this study) making 
WJ up the sampling units in stratum j. 



n 


Using the Lagrangian multiplier method to minimize C subject to equation (5.5} 
holding results in the following values for Xj» n^', and c mtn : 



Although empirical results associated with equations (5.7) - (5.9) are not 
available at the time of this writing, further investigation is underway and 
expectedly, will as Available in the future. 

6.0 Summary and Conclusions 

Empirical results from remotely-sensed (satellite-acquired) data indicate 
that the power function (various forms of which were initially, and successfully, 
utilized by Smith [1938], Jessen [1942], and others [ref. 1, 4, 7, and 12]) is 
satisfactory in modeling the within-stratum between cluster variance for a 
surprisingly large range of sampling cluster sizes. This modeled form was 
then utilized to gain insight into the relationship between the sampling rate 
and the sampling unit size under two separate cost structures. 

Although concern in this paper is devoted entirely to modeling the sampling 
variance, it is not to be misconceived that measurement error variance is insig- 
nificant and, hence, ignored. Further effort is justified (.and currently 
underway) to attempt to model variations due to measurement error. Sufficient 
information exist from the measurement results obtained using the sampling 
unit crop area measurement procedure utilized at NASA/JSC (ref. 14) to warrant 
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further investigation into attempting to characterize this variance as a 
function of sampling unit size also. Until further insight is gained into 
this relationship, determinations of the optimal sampling unit sizes will 
continue to be determined primarily from ranges dictated by various engineering 
and/or other system constraints. 
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APPENDIX A 


In the Large Area Crop Inventorying Experiment, a generalized Neyman 
allocation was developed and used. The formula for the total allocation 
utilizing this allocation is given by 


L j 

2 2 N 


n = 


,A A "Jk ^ V -j 



A A U U j A A L 

CV 2 CP) P 2 + s 2 N. k S 2 (t 2 + Y. 2 ) - 2 A, 2 t- : 

j=l k-1 JK 3 3 j=l J J 


>(A. 1 ) 


where N.^ is the number of 5x6 nautical mile segments constituting stratum j 
(a yield stratum) and substratum k (.the intersection of yield strata with agro- 

A 


physical strata and states), is the segment-to-segment crop area variance 
for stratum j and substratum k, is the yield variance for stratum j, Y. is 

J J 


the estimated yield for the j stratum, and P is the estimated production for the 
U.S. Great Plains. For a derivation of formula A.l see Appendix b or d of 
LACIE: Crop Assessment Subsystem (CAS) Requirement October 1977, NASA/OSC. 
Suppose the segment-to-segment crop area variance using the 5x6 nautical 


mile segment for stratum j and substratum k is Sj^ and the within substratum 
variance is given by 

B -”- IA.2) 


S jk «> * A Jk >< jk ' 

where X is the sampling unit size. Setting X equal the area of the 5x6 
nautical mile segment, a Q , and solving for A^ yields, 

S jk = <V a o Bjk) X Bjk - 


(A.3) 


The number of sampling units of size X is given by, 


N jk = N. k [a 0 /X], CA.4) 

Substituting equations A.3 and A. 4 into equation A.l yield, the total 
allocation utilizing sampling units'of size X, 
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/ L L J - - i B Jk . 

E 2 N ik S ik (a o /X) 2 \ T i + Y i / 

n(x) = V i=l k=l Jk Jk 0 N 3 j / . (A. 5) 

A A, J A A B . r, — 1 L 

CV 2 ( P ) P 2 + E Z N 1k s5 k (X/a ) JK (t? - Y|) - Z A? t 3 
j=l k=l JK JK 0 0 J j =1 J 0 

Upon replacing with B yields 

( L J, Vjkv/1 +Y i ) <v x)2 ' B 


( 1 L j A A V 

[ 2 Z N. k S Jt 2 + Y 2 J 

n(X ) = V^^1 3k i k J .. j 

A A 2 O 1 J 

CV 2 (P) P + (X/a n r 1 2 Z 

0 l-i 


The second term in the denominator of the above equation is dominated by the 
difference of the first and third term, since its presence is due to the 
finite population correction factor. Thus, an approximation for the total 
allocation utilizing a sample unit of size X is given by 

n(X) = K (a 0 /X) 2_B , (A. 7) 

where K is the total allocation associated with 5x6 nautical mile segment. 
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